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Abstract 



a 

^ i We discuss a rejectionless global optimization technique which, while 

q ■ being technically similar to the recently introduced method of Ex- 

O | tremal Optimization, still relies on a physical analogy with a thermal- 

izing system. This method can be used at constant temperature or 
combined with annealing techniques, and is especially well suited for 
^ ■ studying the low temperature relaxation of complex systems as glasses 

*/"") ' and spin glasses. 
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1 Introduction 

The archetype of optimization approaches inspired by physics is arguably 
Simulated Annealing (SA) 0,0]. In SA, putative solutions of the problem 
at hand are organized in an 'energy landscape' similar to the configuration 
space of a physical system. This landscape is searched for the global op- 
timum using the Metropolis algorithm j|] while the temperature is slowly 
q , decreased during the simulation. The theoretical underpinning of SA is 

equilibrium statistical physics and the theory of Markov processes: at each 
temperature T, the Metropolis algorithm asymptotically samples the equi- 
librium Boltzmann distribution which, at T = 0, is supported on the set of 
ground states. 

It is well known to practitioners of SA that maintaining thermalization 
at arbitrarily low temperatures is practically impossible in landscapes with 



many local minima: The trajectories get trapped in subsets of configura- 
tion space (valleys or pockets) surrounding local minima, and most of the 
attempted moves are rejected. Thus, ergodicity is broken, and the final 
solution is just a 'good' local optimum, not the global optimum. 

To mitigate the effects of ergodicity breaking one must make sure that 
many different valleys are thoroughly explored. Ways to do so is to employ 
ensemble methods || or use non-monotonic temperature schedules, as done 
in bouncing ||, thermal cycling ||, and tempering Q. Another possibility 
is to select candidate moves from a broad distribution || §, occasionally 
producing the 'long jumps' in configuration space which are needed to escape 
the neighborhood of a local minimum. 

In this paper we discuss and test a 'thermal' Monte Carlo scheme, which 
we have chosen to call the Waiting Time Method or WTM. The WTM be- 



longs to a familiy of rejectionless or 'event driven' algorithms [ 10 [ which 



developed from the so-called n-fold method introduced by Bortz et al. [11]. 



From an algorithmic point of view, the WTM is very close to a recent op- 



timization method, Extremal Optimization (EO) [12, 13], notwithstanding 



the fact that the latter is inspired by the Bak-Sneppen model of biological 



evolution [14], rather than thermal annealing of a physical system. 

Rejectionless methods can offer much improved performance, especially 
in simulations of glassy systems at low temperature. Yet, they do not seem to 



have gained widespread acceptance [ 15 1 . We hope that the present analysis 



will contribute to improve the situation. 

As a prelude, we show the equivalence of the WTM with the Metropolis 
algorithm with regard to the equilibrium as well as to the dynamical (relax- 
ation) properties. Secondly, we analyze the performance of the WTM and 
identify the temperature region where it is computationally advantageous to 
use it. Finally we compare the performance of Extremal Optimization with 
the WTM run at constant temperature and with the WTM in combination 
with the Geman-Geman annealing schedule ]l6| , [lTj. The numerical analyses 
all use a 3D Ising spin glass model with short range Gaussian interactions 
as a test problem. 

2 The WTM algorithm 

The WTM is well suited for problems where N variables contribute to the 
cost function (energy) in an additive fashion. One such case is the short 
range Ising spin glass model: N spins, o~i = ±1 are placed on a cubic 
lattice, producing a total of 2^ configurations. Each configuration has N 



neighbors, which correspond to the N possible flips of a single spin. The 
energy of configuration a is given by 

» 3 

where the couplings Jy are independently drawn from a Gaussian distribu- 
tion of unit variance and zero mean if i and j point to adjacent lattice sites. 
Otherwise Jy = 0. 

For any configuration^] a, the WTM associates to each spin a stochastic 
variable ti which is the time at which this spin must flip, given that its local 
field J2j Jij a j stays put. A 'global time' value, which is initially set to zero, 
is stored in the variable £ g i bai- 

Let Aj be the energy change associated with the flip of <jj. Flipping 
times ti are initially assigned as 

t i = -T i logX l , (2) 

where the Xi are independent stochastic variables with a uniform distri- 
bution in the unit interval. Thus, each t% has an exponential probability 
distribution with average waiting time given by 

Ti = max(l,exp(Aj/T)). (3) 

After the initialization has been completed, the WTM iterates the following 
three steps: 

1. Flip the spin a m having the lowest flipping time. 

2. Update the global time: igiobai = t m - 

3. Generate from an exponential distribution with average t% fresh wait- 
ing times Si for the spin a m and its lattice neighbors (e.g. six other 
spins in a cubic lattice). The waiting times are added to the global 
time in order to obtain the updated flipping times: ti = tgiobai + Si. 

In order to identify the next move, one needs to search (and update) a 
database of size N storing the current flipping time of each degree of freedom 
as well as pointers to the z degrees of freedom interacting with it. Ordering 
in a balanced tree (a heap), with the shortest flipping time placed at the root 
node, requires a computational effort for the search and update procedure 
of order log N. Since the number of updates per spin flip is equal to z + 1, 
the computational overhead per spin flip is (z + 1) log N, independent of the 
temperature T. 

1 We will often leave the dependence on a understood. 



3 Theoretical considerations 

For completeness, we discuss below a few basic statistical properties of the 
WTM. The main results are derived analytically and then checked numeri- 
cally for illustration purposes. We also briefly describe Extremal Optimiza- 
tion, which is used in the performance comparison of the next section. 

Equilibrium distribution for the WTM 



It is simple to establish that the WTM fulfills detailed balance [10]. Consider 
two configurations, a and /3k, the latter obtained from the former by flipping 
spin k. Given that spin k is the next spin to flip, the probability that the 
system remains in state a for at least a time t is (by construction) 

q k (t)=exp(-t/T k ). (4) 

The rate R/3 kl a of probability flow from a to (3). is then 

Rfa,a = — = min(l, exp(-A fc /T)). (5) 

Tk 

Detailed balance follows since Rf3 k , a /Ra,p k = exp(— A^/T) = P eq (/3fc)/P e q(a)) 
where P eq (a) is the probability of visiting configuration a in equilibrium. 
As a consequence hereof [l^], the stationary distribution of the WTM is the 
Boltzmann distribution over the states of the system, as in the case of the 
Metropolis algorithm. 

Metropolis and WTM dynamics 

Since detailed balance is fulfilled, the WTM has the same equilibrium (Boltz- 
mann) distribution as the Metropolis algorithm. To an excellent approxima- 
tion, there is also a dynamical equivalence, in the sense that the transition 
probabilities from one state to another are very nearly the samef] in the two 
schemes. This equivalence simplifies a performance comparison consider- 
ably, since one only needs to compare the computer times spent on average 
to achieve a flip. 

2 If a number of waiting times n, which all equal one, happens to enter the heap in 
close succession, the WTM has a bias towards flipping the corresponding spins in the same 
order as the n are submitted. This feature hardly has any practical consequences and is 
neglected in the formulae. 



The WTM remains in a state a for a time at least equal to t with a 
'survival' probability given by 

N N 

Q?(t) = J] exp(-t/r k ) = exp(-t £ r' 1 ). (6) 

fc=i fc=i 

The probability that spin k be the first to flip is 

pf=[ 00 rk 1 Q^(t)dt = -^ I . (7) 

JO Ej Tj 

In the Metropolis algorithm, the transition probability P(/3fc, a) from a con- 
figuration a to one of its TV" neighbors 0k is 

P(f3 k ,a) = 1 min(l, exp(-A fc /T)). (8) 

Hence, the probability that spin fe be the first to flip is 

M _ P(0k,a) (Q) 

Since Eq. @j is identical to Eq. (§), the WTM and the Metropolis method 
generate the same Markov chain of spin flips. 

We also note that the intrinsic (or 'physical') time of the WTM approxi- 
mately corresponds to the number of MC steps in the Metropolis algorithm. 
The latter leaves state a unchanged for at least t MC steps with the prob- 
ability 

/ N \ tN 

Q5f(t) = [i-E p (fc'«)) • ( 10 ) 

If a is a local minimum all the A^ are positive and the sum in Eq. (|I0| ) 
is close to zero for small T. A similar conclusion applies to configurations 
close to a local minimum, which is where the algorithm will spend most of 
the computer time in a multi-minima landscape at low T. In this situation 
one easily obtains 

TV 

Q«(*) ~exp[-^min(l,exp(-A fe /T))]. (II) 

fe=l 



Hence, using the usual Monte Carlo step as time unit makes Eq. (|6|) and (|11|) 
identical. Note however that Eq. (11) is an approximation while Eq. <M) is 



exact. 
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Figure 1: The normalized density of states P(e) for a Gaussian Ising spin glass 
with N — 8 3 spins is sampled through 10 6 MC steps. The Metropolis algorithm and 
two variants of the WTM, which are based upon Eq. (ra) and Eq. (113), respectively, 
were utilized to obtain the data. See the main text for further details. 



Fig. [j] shows, for different temperatures, the density of states of the 
spin glass problem sampled with the WTM (dashed lines), the Metropolis 
algorithm (dotted lines) and finally the WTM with a different choice of 



rates, (henceforth "choice b") given in Eq. 12 and discussed below. In all 
cases we used 10 6 MC steps: At T = 1.2 and T = 1.0 this suffices to 
equilibrate the system (the discrepancies barely visible at high energies are 
more pronounced for smaller systems (not shown) and are likely to stem 
from a finite size effect). At T = 0.8 the WTM and Metropolis are still 
indistinguishable. The incipient lack of equilibration transpires from the 
fact that the data sampled by the WTM with choice b of rates now visibly 
differs from the other two sets at high energies. This difference is enhanced at 
T = 0.6, while at T = 0.4 the trajectories become trapped in different local 
minima and the three methods strongly differ due to insufficient statistics. 



Other choices of the average waiting time 

Within all possible choices of rates fulfilling detailed balance, Eq. (gj) stands 
out as the one which, besides producing the desired equilibrium distribu- 
tion, also guarantees dynamical equivalence with the Metropolis algorithm. 
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Figure 2: P(e) averaged over 10 4 runs on the same N = 8 3 gaussian spin glass. 
All runs started at the same configuration with e = —1.6200 and lasted only 100 
Monte Carlo steps. The two versions of the WTM refer to Eq. (§|) and Eq. (||). 

In systems like spin glasses, where the relaxation is dominated by partial 
equilibrations in configuration space traps, one nevertheless expects the dy- 
namics to be insensitive to the precise choice of rates. To check this, we 
considered the choice 

n = exp(A 4 /2T), (12) 

whose appeal mainly lies in the fact that the same expression for n applies 
regardless of the sign of Aj. 

In Fig. ^ we compare the densities of states sampled at T = 0.8 by 
the WTM with rates chosen according to Eq. [l^ and Eq. ||. The latter 
choice is, we recall, equivalent to the Metropolis scheme. The data shown 
are averages over 10000 very short runs. All these runs started from the 
same initial state, with energy per spin e = —1.6200, and lasted 100 MC 
steps, which is not even remotely sufficient for equilibration purposes. As 
anticipated, the densities of states sampled by the two methods are quite 
similar, even at very short time times. 



Extremal Optimization 

The WTM is in many respects algorithmically similar to Extremal Opti- 
We have, unsuccessfully, tried to map the two meth- 
and we also failed to produce an analytical descrip- 



mization [12, 13]. 

ods onto each other, 

tion of the stationary distribution belonging to EO. In short, with optimal 

choice of parameters for optimization, -Peo( £ ) is significantly broader than 

PwTM(e) ©. 

The quality of the contribution to the overall cost from a degree of free- 
dom is considered in EO as a fitness measure. The actual way in which 
the fitness is assigned is not rigidly specified, but reflects one's intuition and 



knowledge of the problem at hand. In our case, we use the local energy 



contribution —(TiJ2jJij 'j as the fitness /j of (Ti, in accordance with |2Cj. 
The following steps are then iterated: 

1. Rank the spins according to their fitness, and pick a spin with rank n 
with probability p ex n~ 7 . The exponent 7 is a free parameter which 
can be tuned for a given class of problems but which is not changed 
dynamically during the runs. 

2. Flip the chosen spin and calculate the ensuing change in the fitness of 
its neighbors. 

If 7 is very large, the least fit spin is chosen every time, and EO resembles 
the Bak-Sneppen model of evolution [14]. 

The major difference between EO and WTM is that only the rank (rel- 
ative magnitude) of the energy changes is important in the former method, 
while the absolute magnitudes are important in the latter. Secondly, EO 
chooses its next move probabilistically from a deterministic ranking of the 
/j. By way of contrast, the WTM chooses the next move deterministically 
from a list of waiting times produced by a stochastic algorithm. Thirdly, 
the ranking procedure in EO utilizes the current energy contribution, while 
the WTM looks ahead and generates the waiting times according to the 
projected energy change. 

4 Tests and results 

This section presents the salient features of the WTM algorithm, found by 
simulations of the 3D Gaussian spin glass model with N spins. We do not 
attempt a complete characterization, which would be tedious for those not 
interested in the particular model used. Likewise, the physical aspects of 
low temperature relaxation in spin glasses will be discussed elsewhere [pi] . 

Acceptance rates and intrinsic WTM time 

The 'intrinsic' or physical time elapsed in a WTM simulation determines 
the amount of configuration space explored and how close the system is to 
thermal equilibrium. Its linear relation to the number of Monte Carlo steps 
is displayed in the left panel of Fig. y for three different temperatures and 
a number of systems sizes. There is no discernible iV dependence, except 
at the lowest temperature, where the smallest systems, N = 5 3 and 7 3 , are 
seen to follow slightly steeper lines (longer times) pi]. 
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Figure 3: The left panel shows the elapsed intrinsic time in the WTM as a func- 
tion of the number of updates measured in MC steps. Results for three different 
temperatures and a number of system sizes are displayed. The right panel shows 
the acceptance rate, i.e the ratio of the number of flips to the number of attempted 
flips, for the Metropolis algorithm, with the box marking the temperature region - 
T < 1.3 - where the WTM is faster. 



The right panel of Fig. |3| shows, for three different system sizes, the em- 
pirical temperature dependence of the acceptance rate a(T) of the Metropo- 
lis algorithm, i.e the ratio of the number of flips to the number of attempted 
flips. Again, the size dependence is negligible. 

The slopes in the left panel are, for each temperature, nearly equal to 
the reciprocal acceptance rate l/a(T) obtained from the right panel. This 
confirms the correspondence between the WTM time t and the number of 
MC steps discussed in Section |3| 

Tuning of parameters 

In ergodic systems, and in particular in finite systems, the best-so-far energy 
per spin £bsf(£) seen in a simulation of length t must approach the ground 
state energy as t — > oo, regardless of the temperature. Nonetheless, for 
realistic values of t and all but the smallest systems, the ground state energy 
is never reached and one can meaningfully study the parameter dependence 



of a suitably averaged the best-so- far energy (fibsf (*))■ 

In applying the WTM to the spin glass problem, we averaged over many 
runs with the same length and different Jij. We observed a clear temperature 
dependence of the resulting (e^f ) an d defined an 'optimal' parameter T opt 
as the temperature minimizing (ebsf) after 10 3 iV spin flips. Empirically, this 



T op t is independent of the runtime through several orders of magnitude [19]. 
It is however dependent on the system size in the following way: For N = 
5 3 ,6 3 ,8 3 , 10 3 and 12 3 , we find T opt = 1.0,0.9,0.7,0.65 and 0.6, respectively. 
While we cannot make any theoretical statements regarding EO, the 
behavior with respect to (ebsf) is very much the same, with 7 in lieu of T. 
However, running EO on the same systems as above showed a much weaker 
N dependence: though djopt/dN < is observed, 7 = 1.1 produces good 
results for N < 12 3 . Compared to the temperature in the WTM, 7 opt is 
insensitive to changes in the system size. 

Runtime considerations 

A ball park figure for the temperature T& where the WTM is faster than 
Metropolis can be obtained by noticing that the generation of waiting times, 
see Eq. (||), is the most time consuming part of the algorithm. For each 
performed spin flip the WTM generates z + 1 = 7 new waiting times, 
while Metropolis requires l/a(T) attempted spin flip operations on aver- 
age. Equating the two expressions yields a(T^) ~ 1/7 « 0.15 and T5 ss 1.5. 
The actually measured value of T& is indicated by the lower corner of the 
box in Fig. |3| and equals ~ 1.3. This is somewhat lower than the estimate, 
likely because we neglected the computational cost of order (z + 1) log N, 
confirmed in Fig. |], which is needed in the search and update procedure. 
From Fig. ^ we see that at T = 0.5 the WTM is approximately three times 
as fast as standard Metropolis. A closer examination of a(T) reveals that at 
T ~ 0.25 and T ~ 0.035 the WTM is 10 and 100 times faster, respectively, 
if both methods are to produce the same number of spin flips. 

Performance comparisons 

We have compared the WTM with an annealing version and with EO. The 
result is shown in Fig. g. In the annealing case we have used the Geman- 
Geman logarithmic schedule T(t) = Tq/(1 + log(l + £)). We observed the 
reasonable fact that (ebsf) is minimized when the initial temperature To is 
chosen so that the final temperature is slightly below T opt in the WTM. 
The WTM is seen to be inferior to EO in terms of spin flips needed to 
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Figure 4: WTM on a 500MHz Pentium III. The plot shows the logarithmic increase 
in runtime as a function of the system size N, due to the updating of the binary 
tree in which the spins are stored according to their flipping times. 



reach the same value of (ebsf)- However, when implementing EO ad literamj], 
at equal runtime EO and the WTM produce similar results for N = 10 3 , i.e. 
they reach approximately the same (ebsf}- For larger systems EO is slower 
than the WTM |9|. 

When applied to an annealing schedule like Geman-Geman, the WTM 
is seen to perform better than EO, since varying the temperature in the 
WTM does not change the runtime notably. We note in passing that while 
the Geman-Geman schedule is usually dismissed as unrealistically slow, it 
performs quite satisfactorily in connection with WTM. 



5 Summary and conclusions 

The WTM belongs to a familiy of rejectionless algorithms related to the 
n-fold way of Ref . [|ll]] . These algorithms asymptotically sample the Boltz- 
mann distribution and are mathematically equivalent to the ubiquitous 
Metropolis algorithm. Compared to Metropolis, they also require a com- 
putational and implementation overhead. However they are considerably 

3 which is usually not done J12| , hjfl , since an exact ordering is costly and an approximate 
ordering is assumed to work just as well. 
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Figure 5: A comparison of the energy per spin (£bsf), averaged over runs on 200 
different spin glasses, for three different methods on a N = 10 3 Gaussian spin 
glass. The legend specifies the EO exponent, the WTM temperature and the initial 
temperature for the Geman-Geman schedule. 

faster at low temperatures, especially in systems with short range interac- 
tions. The n-fold way, in particular, is efficient if n is small, i.e. if the possible 
values of the A& belong to a small, discrete set. The WTM method does 
not suffer from the same limitation, and is therefore especially well suited 
for low T simulations of the dynamics of disordered and glassy systems. 

Our tests on a spin glass system show that Simulated Annealing with 
Geman-Geman type of schedule is slightly faster Extremal Optimization, 
a recent addition to the arsenal of optimization tools. This indicates that 
'thermal' schemes remain a viable approach to global optimization. 
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